“lyrics.csv” is a filtered corpus of 380,000+ song lyrics from from MetroLyrics. You can read more about it on Kaggle.
Love is an immortal topic in songs regardless of their genres or decades. A love song is a song about being in, falling in love, heartbreak upon the end of a love, and the feelings that these experiences bring. Today I focus on “love” songs, by definition, those songs whose lyrics contain “love” more than once, to explore the truly topic and emotion of those “love” songs.
From the plot we can see that the median length of lyrics in Hip-Hop is much larger than it in any other kind of genre. This is reasonable because Hip-Hop songs often require singer rap in very fast speed, giving singer the chance to say more words out. Most of the values are in the close area, less than 1000. But there are still many extreme values, especially in Hip-Hop, Pop and Rock music.
From the plot we can see that the distributions look very similar to each others. But there is still a slight trend that as time went by, the lengths of the songs increased.
Here I use pirateplot in yarrr package to visualize lexical diversity as an aspect to measure the development trend of “love” songs.
Every colored circle in this pirate plot represents a song. The line for each decade represents the mean distinct word count. Overall the lexical diversty did not change a lot during the years, but we can see that the number of large extreme values increased, indicating that songwriters tend to write more different words in “love” songs.
I use Term frequency and Term frequency–Inverse document frequency to respectively measure the key words for “love” songs in each decade.
\[ \begin{aligned} TF(t)= \frac{\# word \ t \ appears \ in \ a \ song}{TTL \ \# words \ in \ the\ song} \end{aligned} \]
\[ \begin{aligned} IDF(t)= \ln \frac{TTL\ \# word \ of \ songs }{\# songs \ with \ word\ t \ in \ it} \end{aligned} \]
\[ \begin{aligned} TF-IDF(t)= TF(t) \cdot IDF(t) \end{aligned} \]
\(1970S\)
The top wordcloud is the words of “love” songs with top 20 TF values in 1970S. The bottom is those with top 20 TF_IDF values. As expected, love, baby, women are very common words in 1970S love songs. But when we look at TF-IDF, things totally changed. There are representative love song singers Deanie and Eydie, and also some locations might be famous for romantice love story: Surabaya.
\(1980S\)
Similarly, for TF, love, baby, heart are very common words in 1980S love songs. For TF-IDF, some words with no real meanings like whoahohoho, whoaoo and hooee.
\(1990S\)
For 1990S, the important words (TF-IDF) are very similar to those in 1980S, sugguesting that the “love” songs content did not change a lot during that 20 years.
\(2000S\)
\(2010S\)
We can find the TF-IDF word clouds for 2000S and 2010S are very simialr and very diffrent from previous clouds. “love” songs tend to be colloquial. It is probably because as the development of society, people are more brave and unconstrained to speak out what they are thinking deep in heart.
I used two sentiment datasets in tidytext package.
We can see that in the past 50 years, majority of love songs have positive attitude and joy emotion. Still many songs have sadness and negative emotions. Overall, when singers sing about love, the theme of the song is usually happy and full of joy.
An interesting thought. There is a slight downward trend of positive percent of love song words as time goes by. In 1970S and 1980S, over 60% of love songs sing about positive emotions. People at that time might have a more positive attitude towards love. Nowadays, more and more singers tend to sing about negative love stories.
For different music genre, metal nad hip-hop love songs have significant less positive percent of lyric words. This might because of their development history. Jazz has the largest positive percent of lyric words.
The radar plot strenthend the conclusion that love songs in 1970S and 1980S talk more about positive attitude, since they have lowest proportion of sadness, anger and fear, and highest proportion of joy. 2000S and 2010S love songs have highest proportion of fear and anger, very opposite from the old time love songs.
For R&B and Jazz “love” songs, they have the highest proportions of joy. jazz also has the lowest proportion of anger. Hip-Hop and Metal “love” songs have highest value of proportions of anger and fear. We see that joy and anger is a set of opposite emotions.
Important words in lyrics of “love” songs vary a lot among 1970-1989 and 2000-2019. Song writers in old times prefer writing romantic love story, while nowadays song writers prefer talking about their true feelings.
If someone wants to fit in the new ara of love songs, maybe she/he can write a sad love story song.
Different genres of song writers have difference preferences for telling what kind of love story. For Jazz love songs, happy is most common For Hip-Hop, anger is more common.